Describe variation in spawn timing and how it relates to environmental covariates.
This study was conducted in the Middle Fork of the Salmon River (MFSR) in central Idaho (1). The MFSR is a tributary of the Salmon River and is part of the larger Columbia River Basin. The MFSR is home to several species of salmon, including Chinook salmon (Oncorhynchus tshawytscha).
Figure 1: Map of the Middle Fork Salmon River (MFSR) study area showing redd locations used in the analysis (2002-2005) and stream reaches.
Spawn timing data for Chinook salmon were collected from 2001 to 2005 in the MFSR.
We removed data from 2001, and data from Knapp Creek and Cape Horn Creek, as these sites were not consistently sampled.
Because redds were not observed daily, we inferred spawn dates as the initial date a completed redd was observed.
We joined each redd location to the NHD and assigned them a COMID, analogous to a stream reach. The COMID is used to link the spawn time data with covariate data associated with the stream reach on which it is located.
## # A tibble: 3,016 × 6
## redd_id COMID spawn_date stream year yday
## <dbl> <fct> <date> <fct> <fct> <dbl>
## 1 1295 23519365 2002-08-23 Bear Valley 2002 235
## 2 1296 23519365 2002-08-23 Bear Valley 2002 235
## 3 1298 23519319 2002-08-23 Bear Valley 2002 235
## 4 1299 23519319 2002-08-23 Bear Valley 2002 235
## 5 1300 23519319 2002-08-23 Bear Valley 2002 235
## 6 1301 23519317 2002-08-23 Bear Valley 2002 235
## 7 1302 23519317 2002-08-26 Bear Valley 2002 238
## 8 1303 23519317 2002-09-03 Bear Valley 2002 246
## 9 1304 23519317 2002-08-26 Bear Valley 2002 238
## 10 1305 23519317 2002-09-03 Bear Valley 2002 246
## # ℹ 3,006 more rows
The data comprise 3016 redd observation from 8 streams across 4 years. The redds were observed between day 206 and 263.
Figure 2: Histogram and density of spawn timing data (DOY) for all streams and years.
Figure 3: Spawn time variation by year and stream.
Figure 4: Spawn time by COMID and stream.
Figure 5: Spawning phenology of adult Chinook Salmon. In all panels, the black density function represented stream-level spawn timing, while the colored density functions represent the spawn timing of individual years. The dashed vertical purple lines represent the 5th and 95th percentiles of the basin-wide spawn timing.
(#fig:plot-cum-props, )Cumulative proportion of completed of redds by stream.
To test for environmental factors driving variation in spawn timing, we quantified associations with metrics describing thermal and physical conditions in stream reaches.
We selected covariates based on the following criteria: (1) they are known to influence spawn timing, (2) they are available for all streams, and (3) they are not highly correlated with each other.
Our focal independent variable were:
We used modeled daily average stream temperature values predicted at the stream segment (COMID) scale (Siegel et al. 2023; available from https://zenodo.org/records/8174951). These data were downloaded and filtered to 2001-2005 and for the MFSR.
Figure 6: Modeled thermal regimes (2001-2005) for MFSR tributaries. Black line = mean, Red ribbon = 40 - 60th percentiles, Grey ribbon = full range.
We calculated metrics relative to a COMID a redd was constructed on and a redd completion date; before, after, and spanning the date.
For example, temp_30_before is the average temperature for a COMID where a redd was constructed calculated over the previous 30 days. We did this for 30, 60, and 90 days.
We also calculated a time invariant metric relative to a fixed date across all years that was chosen to represent an initial spawning window, e.g., August 1.
The time invariant and after metrics were omitted from further consideration as preliminary data exploration showed weak if any relationship with spawn timing.
## # A tibble: 6 × 6
## redd_id COMID spawn_date temp_30 temp_60 temp_90
## <dbl> <dbl> <date> <dbl> <dbl> <dbl>
## 1 1000 23519529 2001-08-31 12.6 12.3 11.2
## 2 1001 23519531 2001-08-31 12.7 12.3 11.3
## 3 1002 23519531 2001-08-31 12.7 12.3 11.3
## 4 1003 23519531 2001-08-31 12.7 12.3 11.3
## 5 1004 23519531 2001-08-31 12.7 12.3 11.3
## 6 1005 23519531 2001-08-31 12.7 12.3 11.3
Stream flow data were compiled from a single USGS Gage lower in the watershed (MF Salmon River at MF Lodge NR Yellow Pine ID - 13309220). Becuase flow data are not COMID- or stream-specific, it makes sense to think about and represent flow as an out-of-basin year effect that determines when adults make it back to the MFSR and initially onto the spawning grounds.
Figure 7: Inter-annual variability in daily discharge (cfs) at MF Lodge USGS Gage 13309220.
We calculated flow metrics relative to a COMID a redd was constructed on and a redd completion date; before, after, and spanning the date.
For example, temp_30_before is the average temperature for a COMID where a redd was constructed calculated over the previous 30 days. We did this for 30, 60, and 90 days.
The spanning and after metrics were omitted from further consideration as preliminary data exploration showed weak if any relationship with spawn timing.
## # A tibble: 124 × 4
## spawn_date flow_30 flow_60 flow_90
## <date> <dbl> <dbl> <dbl>
## 1 2001-08-31 363. 449. 628.
## 2 2001-08-27 394. 475 683.
## 3 2001-08-28 390. 467. 668.
## 4 2001-08-29 385. 461. 653.
## 5 2002-08-23 651. 997. 1993.
## 6 2002-08-26 624. 912. 1882.
## 7 2002-09-03 584. 766. 1412.
## 8 2002-08-19 697. 1140. 2151.
## 9 2002-08-20 684. 1106. 2099.
## 10 2002-08-24 642. 967. 1960.
## # ℹ 114 more rows
Elevation and stream slope data were available at the COMID (stream reach) scale from the NHD.
## # A tibble: 231,807 × 3
## COMID slope mean_elevation
## <int> <dbl> <dbl>
## 1 24202021 0.0134 1231.
## 2 24202023 0.112 1024.
## 3 24202033 0.0633 1603.
## 4 24202025 0.177 1893.
## 5 24202027 0.109 1080.
## 6 24202031 0.107 1213.
## 7 24202035 0.141 1037.
## 8 24202037 0.0583 1444.
## 9 24202039 0.139 1370.
## 10 24202041 0.130 1083.
## # ℹ 231,797 more rows
## Rows: 3,013
## Columns: 14
## $ redd_id <dbl> 1295, 1296, 1298, 1299, 1300, 1301, 1302, 1303, 1304, 1…
## $ COMID <fct> 23519365, 23519365, 23519319, 23519319, 23519319, 23519…
## $ spawn_date <date> 2002-08-23, 2002-08-23, 2002-08-23, 2002-08-23, 2002-0…
## $ stream <fct> Bear Valley, Bear Valley, Bear Valley, Bear Valley, Bea…
## $ year <fct> 2002, 2002, 2002, 2002, 2002, 2002, 2002, 2002, 2002, 2…
## $ yday <dbl> 235, 235, 235, 235, 235, 235, 238, 246, 238, 246, 246, …
## $ flow_30 <dbl> 650.8333, 650.8333, 650.8333, 650.8333, 650.8333, 650.8…
## $ flow_60 <dbl> 997.4333, 997.4333, 997.4333, 997.4333, 997.4333, 997.4…
## $ flow_90 <dbl> 1992.733, 1992.733, 1992.733, 1992.733, 1992.733, 1992.…
## $ temp_30 <dbl> 12.70318, 12.70318, 14.02703, 14.02703, 14.02703, 13.78…
## $ temp_60 <dbl> 12.87280, 12.87280, 14.20337, 14.20337, 14.20337, 13.95…
## $ temp_90 <dbl> 11.07830, 11.07830, 12.15472, 12.15472, 12.15472, 11.92…
## $ slope <dbl> 0.00299158, 0.00299158, 0.00171806, 0.00171806, 0.00171…
## $ mean_elevation <dbl> 1951.730, 1951.730, 1946.735, 1946.735, 1946.735, 1944.…
temp_30 = no relationship, droptemp_60 = good non-linear, drop for temp_90?temp_90 = better non-linearflow_30 and flow_60 = similar decaying exponentialflow_90 = inflections, interesting grouping really spreads out, year effectSLOPE = no relationship, dropmean_elevation = slightly negative linear?model_data |>
select(temp_30, temp_60, temp_90, flow_30, flow_60, flow_90, mean_elevation) |>
GGally::ggpairs()
temp_90 as it has the strongest relationship.flow_30 and flow_60 are highly correlated with flow_90.flow_90 as it has the strongest relationship.Check VIFs with and without flow_30 and flow_60:
car::vif(lm(yday ~ flow_30 + flow_60 + flow_90 + temp_90 + mean_elevation + stream + year, data = model_data))
## GVIF Df GVIF^(1/(2*Df))
## flow_30 54.18575 1 7.361097
## flow_60 42.25228 1 6.500176
## flow_90 13.47150 1 3.670354
## temp_90 15.37376 1 3.920938
## mean_elevation 21.09913 1 4.593379
## stream 64.19481 7 1.346192
## year 54.26889 3 1.945771
car::vif(lm(yday ~ flow_90 + temp_90 + mean_elevation + stream + year, data = model_data))
## GVIF Df GVIF^(1/(2*Df))
## flow_90 9.605784 1 3.099320
## temp_90 14.146143 1 3.761136
## mean_elevation 20.418174 1 4.518647
## stream 53.393667 7 1.328594
## year 8.676807 3 1.433486
flow_30 and flow_60Temp_90(#fig:plot-temp_90)Spawn timing vs. 90-day mean temperature pre spawn.
Check simple models for temp to examine functional structure and compare with AIC:
m1 <- lm(yday ~ temp_90, data = model_data)
m2 <- lm(yday ~ I(temp_90^2), data = model_data)
m3 <- lm(yday ~ temp_90 + year, data = model_data)
m4 <- lm(yday ~ temp_90 + year + stream, data = model_data)
m5 <- lm(yday ~ temp_90 * stream, data = model_data)
m6 <- lm(yday ~ temp_90 * stream + year, data = model_data)
m7 <- lm(yday ~ temp_90 * year + stream, data = model_data)
m8 <- lm(yday ~ temp_90 * stream + I(temp_90^2), data = model_data)
m9 <- lm(yday ~ temp_90 * stream + year + I(temp_90^2), data = model_data)
| df | AIC | delta | |
|---|---|---|---|
| m9 | 21 | 15602.84 | 0.0000 |
| m6 | 20 | 15846.65 | 243.8107 |
| m7 | 16 | 16110.32 | 507.4761 |
| m8 | 18 | 16192.42 | 589.5746 |
| m4 | 13 | 16245.22 | 642.3722 |
| m5 | 17 | 16547.05 | 944.2027 |
| m3 | 6 | 17613.48 | 2010.6394 |
| m1 | 3 | 17983.29 | 2380.4428 |
| m2 | 3 | 18211.19 | 2608.3443 |
m9: yday ~ temp_90 * stream + year + I(temp_90^2)
m6: yday ~ temp_90 * stream + year (no quadratic)Let’s plot the predictions from m9 to see how well it fits the data.
(#fig:temp_90_preds)Predicted spawn timing by stream and year.
Observations from the plot
Notable spatial breaks in some streams
Interpretation
Visual inspection of stream- and year-specific fits revealed curvature in temperature–spawn timing relationships, with apparent within-stream variation in baseline timing across COMIDs. This suggested spatial heterogeneity not explained by stream-level fixed effects alone.
To account for this, we evaluated the addition of random intercepts for COMID with both top models (m6 and m9).
m6_re <- lmer(yday ~ temp_90 * stream + year + (1|COMID), data = model_data)
m9_re <- lmer(yday ~ temp_90 * stream + year + I(temp_90^2) + (1|COMID), data = model_data)
| df | AIC | delta | |
|---|---|---|---|
| m9_re | 22 | 13011.83 | 0.00000 |
| m6_re | 21 | 13050.52 | 38.69329 |
| m9 | 21 | 15602.84 | 2591.01254 |
| m6 | 20 | 15846.65 | 2834.82322 |
m6 vs m6_re (same fixed effects, RE added): ΔAIC = 2796.1m6_re vs m9_re: ΔAIC = 38.7m9 (quadratic only) still has a much worse AIC than m6_re (linear w/ RE): 15602 vs 13050 → ΔAIC ≈ 2552m9_re) gives best performanceTakeaways
m9_re) is supported: reflects spatial variation and nonlinear phenological driversPlot predictions with random intercepts:
(#fig:temp_90_preds-re)Predicted spawn timing by stream and year with random intercepts.
What this shows:
Streams with wide COMID spread:
Streams with tight COMID clustering:
The simplified model with COMID random intercepts is capturing reach-level variation without overwhelming the model. This is a strong candidate structure to carry forward or use as a baseline for comparing against more complex interaction models.
Thoughts from Dan: Temporal compression in lower reaches?
Dan’s field observation: fish in warmer, lower reaches spawn later and over a shorter window.
Upstream vs. downstream dynamics in Big, Camas, Loon, etc. likely manifest in:
Statistical implication:
Should we add COMID slopes?
The next logical step is to consider random slopes. This lets each COMID have its own slope for temp_90, capturing different rates of response to temperature.
But…
Overfitting risk! Here’s the situation:
| Factor | Status |
|---|---|
COMID |
108 levels |
| Many have | < 5 redds |
| Some have | only 1 redd |
Random slopes are data-hungry. With sparse COMID-level samples, this model will:
So — stick with random intercepts, and revisit slopes later only if:
Conclusion
I think we will proceed and compare the full interaction model with and without the quadratic term. We’ll Stick with (1 | COMID) for now — it gets the key structure without overfitting. And flag a future option to explore grouped random slopes, or COMID × temp_90 interaction if biologically and statistically justified.
Flow_90(#fig:plot-flow_90)Spawn timing vs. 90-day mean discharge pre spawn at MF Lodge.
Check simple models for flow to examine functional structure and comapare with AIC:
m1 <- lm(yday ~ flow_90, data = model_data)
m2 <- lm(yday ~ I(flow_90^2), data = model_data)
m3 <- lm(yday ~ flow_90 + year, data = model_data)
m4 <- lm(yday ~ flow_90 + year + stream, data = model_data)
m5 <- lm(yday ~ flow_90 * stream, data = model_data)
m6 <- lm(yday ~ flow_90 * stream + year, data = model_data)
m7 <- lm(yday ~ flow_90 * year + stream, data = model_data)
m8 <- lm(yday ~ flow_90 * stream + I(flow_90^2), data = model_data)
m9 <- lm(yday ~ flow_90 * stream + year + I(flow_90^2), data = model_data)
| df | AIC | delta | |
|---|---|---|---|
| m7 | 16 | 9870.368 | 0.000 |
| m9 | 21 | 12875.046 | 3004.678 |
| m6 | 20 | 14100.258 | 4229.890 |
| m4 | 13 | 14682.235 | 4811.867 |
| m3 | 6 | 15098.512 | 5228.143 |
| m8 | 18 | 17381.664 | 7511.296 |
| m5 | 17 | 17618.492 | 7748.124 |
| m1 | 3 | 18662.693 | 8792.325 |
| m2 | 3 | 19039.519 | 9169.151 |
m7: yday ~ flow_90 * year + streamLet’s plot the predictions from m7 to see how well it fits the data.
(#fig:flow_90_preds_year)Predicted spawn timing by year.
(#fig:flow_90_preds_all)Predicted spawn timing by stream and year.
flow_90flow_90 × year interaction may be statistically retained and meaningfully variable.flow_90 behaves like a proxy for year, and its explanatory power may be redundant with year.flow_90 is still a basin-wide measure, not stream-specificEcological Interpretation
But this doesn’t necessarily mean site-specific flow is a driver. Just that flow_90 tracks broader seasonal variation (i.e., interannual hydrology), already captured by year.
Flow could be valid in a year-based model, or as a year-level summary, but not site-level unless spatially resolved.
Including flow_90 could introduce spurious precision and possibly overfitting. Why flow_90 might be problematic:
flow_90 is calculated from a single downstream gauge, and applied to all reddsflow_90 varies mostly across years, (albeit slightly with streams), it is strongly confounded with yearflow_90 and year risks collinearity, and may produce misleading inferencesflow_90 is aligned to each redd’s spawn date, it still reflects a lower-basin gauge, not the actual hydrologic conditions experienced at the redd siteBottom Line:
Unless we have site-specific or spatially disaggregated flow data, flow_90 is probably not a valid covariate for redd-level models.
Including it may:
Recommendation:
Drop flow_90 from model (or at most, keep it as a year-level covariate if we summarize it to a single annual value for all observations)
Text for ms:
Although we initially considered including 90-day mean streamflow (flow_90) as a predictor of spawn timing, this variable was ultimately excluded due to concerns about ecological validity and model overfitting. Streamflow data were derived from a single downstream USGS gauge and did not capture spatial variation across the study streams or reaches. Moreover, because flow_90 was closely aligned with year, it introduced strong collinearity with the year effect and risked attributing site-level variation to flow patterns not actually experienced by individual redds. As such, we excluded flow_90 to avoid misleading inference.
Final dataset includes:
yday), continuouscomid, stream, yeartemp_90, elevation## # A tibble: 3,013 × 7
## yday COMID stream year temp_90 mean_elevation slope
## <dbl> <fct> <fct> <fct> <dbl> <dbl> <dbl>
## 1 235 23519365 Bear Valley 2002 -0.316 0.717 -0.519
## 2 235 23519365 Bear Valley 2002 -0.316 0.717 -0.519
## 3 235 23519319 Bear Valley 2002 0.471 0.694 -0.711
## 4 235 23519319 Bear Valley 2002 0.471 0.694 -0.711
## 5 235 23519319 Bear Valley 2002 0.471 0.694 -0.711
## 6 235 23519317 Bear Valley 2002 0.303 0.683 -0.539
## 7 238 23519317 Bear Valley 2002 0.468 0.683 -0.539
## 8 246 23519317 Bear Valley 2002 0.828 0.683 -0.539
## 9 238 23519317 Bear Valley 2002 0.468 0.683 -0.539
## 10 246 23519317 Bear Valley 2002 0.828 0.683 -0.539
## # ℹ 3,003 more rows
We should only include a grouping variable as both fixef and ranef when you want to model differenct aspects of the effect. E.g., stream as fixed to compared average effects by stream, but as ranefs to account for non-independence of obs within each stream but not to compare average effects. Include as both when you want population-level estimates AND when you expect stream-level random variation around those means. Do not include both when intercepts are redundant (~ stream + (1| stream)), or too few levels for the random effect to be estimated (e.g., <5 levels).
| Variable | Fixed? | Random? | Why |
|---|---|---|---|
COMID |
No | Yes (but must check data availability among levels; likely sparse) | Not estimating individual COMID effects, just accounting for correlation. |
Stream |
Yes | Maybe (only if random slope or complex structure) | May want to estimate differences and account for grouping. |
Year |
Yes (comparing years) | Maybe (account for inter-annual variability) | Use one or the other depending on goal. |
In this case, it makes sense to include year as fixed because:
Let’s interrogate the grouping structure within the data to make a final decision about COMIDs.
| stream | n_COMIDs |
|---|---|
| Big | 21 |
| Loon | 19 |
| Camas | 16 |
| Marsh | 13 |
| Bear Valley | 11 |
| Sulphur | 11 |
| Beaver | 7 |
| Elk | 6 |
Figure 8: Number of observations (redds) per COMID.
Are the enough COMIDs per stream to consider stream/COMID nested random effects? Not really…
Are groups well sampled? (Do most COMIDs have >1-2 redds?) No. 23 COMIDs have <5 redds (25.9615385%), 13 have <= 2. (12.5%). With <5 obs/level, variance estimates become unstable -> overfit and absorb noise (low AIC / high R2) -> singular fits.
Are year or stream-level random effects justifiable? We want to compare streams and years in this dataset, not generalize beyond them, so we should use fixed effects. Further, stream are known, of interest, and were not randomly sampled. Including (1 | stream) would be statistically redundant (stream intercepts would be double-modeled) and would lead to misleading AIC comparisons.
Random slopes?. The needs multiple obs per group across a range of slope variable (temp_90), enough replication to estimate variation in slopes, not just intercepts. Need ~8+ obs per group spread across the covariate. We could do this for stream or year. Prob not.
So, we should use a simple linear model with no random effects. This allows for interpretable results, no overfitting from random effects, and is a reasonable approach given the design.
If we did try (1 | COMID), expect singularity, low variance estimates for COMID, and predictions will likely miss the groups means.
That said, as shown in Temp_90, we probably should be including COMID as a random effect. So we many have to simple the model (e.g., omit elevation), to ovoid over fitting.
First we’ll fit simple linear models to get a sense of the data and the relationships. Adding new interactions each time.
# All combinations of 1–5 fixed effects: temp_90, stream, year, mean_elevation, SLOPE
# 31 total
m1 <- lm(yday ~ temp_90, data = model_data)
m2 <- lm(yday ~ stream, data = model_data)
m3 <- lm(yday ~ year, data = model_data)
m4 <- lm(yday ~ mean_elevation, data = model_data)
m5 <- lm(yday ~ slope, data = model_data)
m6 <- lm(yday ~ temp_90 + stream, data = model_data)
m7 <- lm(yday ~ temp_90 + year, data = model_data)
m8 <- lm(yday ~ temp_90 + mean_elevation, data = model_data)
m9 <- lm(yday ~ temp_90 + slope, data = model_data)
m10 <- lm(yday ~ stream + year, data = model_data)
m11 <- lm(yday ~ stream + mean_elevation, data = model_data)
m12 <- lm(yday ~ stream + slope, data = model_data)
m13 <- lm(yday ~ year + mean_elevation, data = model_data)
m14 <- lm(yday ~ year + slope, data = model_data)
m15 <- lm(yday ~ mean_elevation + slope, data = model_data)
m16 <- lm(yday ~ temp_90 + stream + year, data = model_data)
m17 <- lm(yday ~ temp_90 + stream + mean_elevation, data = model_data)
m18 <- lm(yday ~ temp_90 + stream + slope, data = model_data)
m19 <- lm(yday ~ temp_90 + year + mean_elevation, data = model_data)
m20 <- lm(yday ~ temp_90 + year + slope, data = model_data)
m21 <- lm(yday ~ temp_90 + mean_elevation + slope, data = model_data)
m22 <- lm(yday ~ stream + year + mean_elevation, data = model_data)
m23 <- lm(yday ~ stream + year + slope, data = model_data)
m24 <- lm(yday ~ stream + mean_elevation + slope, data = model_data)
m25 <- lm(yday ~ year + mean_elevation + slope, data = model_data)
m26 <- lm(yday ~ temp_90 + stream + year + mean_elevation, data = model_data)
m27 <- lm(yday ~ temp_90 + stream + year + slope, data = model_data)
m28 <- lm(yday ~ temp_90 + stream + mean_elevation + slope, data = model_data)
m29 <- lm(yday ~ temp_90 + year + mean_elevation + slope, data = model_data)
m30 <- lm(yday ~ stream + year + mean_elevation + slope, data = model_data)
m31 <- lm(yday ~ temp_90 + stream + year + mean_elevation + slope, data = model_data)
| Model | df | AIC | delta |
|---|---|---|---|
| m31 | 15 | 15849.91 | 0.00000 |
| m26 | 14 | 15933.83 | 83.91617 |
| m27 | 14 | 16190.84 | 340.92786 |
| m16 | 13 | 16245.22 | 395.30513 |
| m28 | 12 | 16950.14 | 1100.23075 |
| m18 | 11 | 16973.74 | 1123.82891 |
| m17 | 11 | 16993.81 | 1143.89804 |
| m6 | 10 | 17012.83 | 1162.92236 |
| m29 | 8 | 17482.46 | 1632.55077 |
| m19 | 7 | 17568.43 | 1718.51997 |
| Rank | Model | Formula (inferred) | df | AIC | ΔAIC |
|---|---|---|---|---|---|
| 1 | m31 | temp_90 + stream + year + mean_elevation + SLOPE |
15 | 15849.91 | 0.00 |
| 2 | m26 | temp_90 + stream + year + mean_elevation |
14 | 15933.83 | 83.91 |
| 3 | m27 | temp_90 + stream + year + SLOPE |
14 | 16190.84 | 340.93 |
| 4 | m16 | temp_90 + stream + year |
13 | 16245.22 | 395.31 |
| 5 | m28 | temp_90 + stream + mean_elevation + SLOPE |
13 | 16950.14 | 1100.23 |
Interpretation
Model m31 (best) - Contains all five covariates - Suggests that each adds unique, non-redundant signal - Likely a solid benchmark, but might be at risk of overfitting or including covariates of weak effect
🔻 Model m26
- Drops slope → minimal ΔAIC (Δ = 84)
- Suggests slope may not be that influential when year and elevation are included
🔻 Model m27
- Drops mean_elevation instead → ΔAIC = 341
- Implies elevation likely provides more explanatory power than slope in this model context
🔻 Model m16
- Drops both mean_elevation and slope
- Still relatively good (Δ = 395), meaning that temp_90 + stream + year explains the bulk of variation
🔻 Model m28
- general model as suggest by Dan
- Omits year, includes everything else → ΔAIC = 1100
- Big drop confirms that year is critically important, likely absorbing interannual hydrology, climate variation, or escapement.
Takeaways
year and stream are foundational — they’re in all top models.temp_90 is always present — unsurprisingly.Recommendation:
Look at the model estimates for top 5 models:
| Full model | No slope | No elevation | No elevation or slope | No year | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Predictors | Estimates | CI | p | Estimates | CI | p | Estimates | CI | p | Estimates | CI | p | Estimates | CI | p |
| (Intercept) | 111.42 | 104.62 – 118.23 | <0.001 | 115.37 | 108.52 – 122.22 | <0.001 | 175.79 | 174.21 – 177.37 | <0.001 | 177.23 | 175.68 – 178.77 | <0.001 | 162.89 | 155.55 – 170.22 | <0.001 |
| temp 90 | 7.09 | 6.88 – 7.31 | <0.001 | 6.92 | 6.71 – 7.14 | <0.001 | 5.36 | 5.23 – 5.48 | <0.001 | 5.26 | 5.13 – 5.39 | <0.001 | 5.32 | 5.09 – 5.55 | <0.001 |
| stream [Beaver] | 9.57 | 8.95 – 10.19 | <0.001 | 9.78 | 9.15 – 10.41 | <0.001 | 7.93 | 7.30 – 8.56 | <0.001 | 8.16 | 7.53 – 8.79 | <0.001 | 6.40 | 5.69 – 7.11 | <0.001 |
| stream [Big] | 10.78 | 9.60 – 11.96 | <0.001 | 11.21 | 10.01 – 12.40 | <0.001 | 0.19 | -0.28 – 0.66 | 0.434 | 0.87 | 0.43 – 1.31 | <0.001 | 3.03 | 1.72 – 4.35 | <0.001 |
| stream [Camas] | 8.17 | 7.28 – 9.06 | <0.001 | 9.01 | 8.13 – 9.90 | <0.001 | 1.22 | 0.66 – 1.78 | <0.001 | 2.15 | 1.64 – 2.66 | <0.001 | 3.84 | 2.82 – 4.87 | <0.001 |
| stream [Elk] | 1.58 | 1.09 – 2.06 | <0.001 | 1.26 | 0.77 – 1.75 | <0.001 | -0.01 | -0.49 – 0.48 | 0.983 | -0.23 | -0.71 – 0.25 | 0.353 | 0.18 | -0.39 – 0.76 | 0.533 |
| stream [Loon] | 12.20 | 11.29 – 13.12 | <0.001 | 12.19 | 11.26 – 13.11 | <0.001 | 4.99 | 4.43 – 5.55 | <0.001 | 5.19 | 4.63 – 5.76 | <0.001 | 8.13 | 7.07 – 9.19 | <0.001 |
| stream [Marsh] | 4.48 | 4.00 – 4.97 | <0.001 | 4.94 | 4.45 – 5.42 | <0.001 | 5.82 | 5.32 – 6.31 | <0.001 | 6.17 | 5.68 – 6.66 | <0.001 | 5.09 | 4.51 – 5.67 | <0.001 |
| stream [Sulphur] | 13.99 | 12.91 – 15.06 | <0.001 | 13.43 | 12.34 – 14.52 | <0.001 | 5.23 | 4.61 – 5.86 | <0.001 | 5.02 | 4.39 – 5.65 | <0.001 | 5.28 | 4.13 – 6.44 | <0.001 |
| year [2003] | -3.86 | -4.17 – -3.55 | <0.001 | -3.85 | -4.16 – -3.54 | <0.001 | -2.86 | -3.16 – -2.55 | <0.001 | -2.88 | -3.19 – -2.57 | <0.001 | |||
| year [2004] | 2.18 | 1.79 – 2.57 | <0.001 | 2.13 | 1.73 – 2.52 | <0.001 | 1.74 | 1.33 – 2.15 | <0.001 | 1.71 | 1.29 – 2.12 | <0.001 | |||
| year [2005] | 3.94 | 3.44 – 4.45 | <0.001 | 3.89 | 3.38 – 4.40 | <0.001 | 3.92 | 3.39 – 4.45 | <0.001 | 3.88 | 3.34 – 4.42 | <0.001 | |||
| mean elevation | 0.02 | 0.02 – 0.02 | <0.001 | 0.02 | 0.02 – 0.02 | <0.001 | 0.01 | 0.00 – 0.01 | <0.001 | ||||||
| slope | 109.56 | 86.49 – 132.62 | <0.001 | 93.47 | 69.12 – 117.82 | <0.001 | 95.58 | 67.90 – 123.25 | <0.001 | ||||||
| Observations | 3013 | 3013 | 3013 | 3013 | 3013 | ||||||||||
| R2 / R2 adjusted | 0.840 / 0.839 | 0.835 / 0.835 | 0.821 / 0.820 | 0.817 / 0.817 | 0.769 / 0.768 | ||||||||||
Interpretation: Elevation is probably doing more work than slope, but neither may be critical once stream and year are accounted for.
Recommendations - Retain: temp_90, stream, year - Evaluate: - Keep mean_elevation if effect size is consistent and interpretable - Drop SLOPE unless it provides a strong ecological rationale
Next: - Try a version of the best model without elevation and slope, or just elevation, to test sensitivity - Proceed to adding the quadratic term (I(temp_90^2)) - Then test COMID as a random intercept - test interactions
Compare targeted models first, Why?
temp_90, stream, and year are criticalelevation and slope are questionable in explanatory valueWhat to do:
temp_90 + stream + year (baseline)mean_elevation and/or SLOPEI(temp_90^2)(1 | COMID)This gives you a clean sense of model structure before going into interactions.
| df | AIC | delta | |
|---|---|---|---|
| mod_quad_re | 15 | 13452.18 | 0.000 |
| mod_quad_elev | 15 | 15493.04 | 2040.858 |
| mod_quad | 14 | 15791.11 | 2338.923 |
| mod_elev | 14 | 15933.83 | 2481.644 |
| mod_base | 13 | 16245.22 | 2793.033 |
| Model | Formula Concept | df | AIC | ΔAIC |
|---|---|---|---|---|
mod_quad_re |
`temp_90 + I(temp_90^2) + stream + year + (1 | COMID) | 15 | 13452.2 |
mod_quad_elev |
temp_90 + I(temp_90^2) + stream + year + elevation |
15 | 15493.0 | 2040.9 |
mod_quad |
temp_90 + I(temp_90^2) + stream + year |
14 | 15791.1 | 2338.9 |
mod_elev |
temp_90 + stream + year + elevation |
14 | 15933.8 | 2481.6 |
mod_base |
temp_90 + stream + year |
13 | 16245.2 | 2793.0 |
Interpretation
- mod_quad_re (winner)
- Massive AIC improvement — >2000 points — over any model lacking COMID as a random effect
- Also supports the value of the quadratic temperature term
- This is clear front-runner
mod_quad_elev vs mod_quad
- Elevation does slightly improve fit when random effects are excluded (ΔAIC ≈ 140)
- But when (1 | COMID) is included, the value of elevation vanishes (i.e., you never tested mod_quad_re + elevation, which is telling)
- Suggests elevation is already absorbed by COMID-level differences — consistent with earlier findings
Conclusion
- Keep: temp_90, I(temp_90^2), stream, year, and (1 | COMID)
- Drop: mean_elevation (already accounted for in COMID)
- The winning model (mod_quad_re) is simple, interpretable, and statistically robust
| df | AIC | delta | |
|---|---|---|---|
| mod_interact3 | 25 | 11673.57 | 0.000 |
| mod_interact2 | 18 | 12109.61 | 436.041 |
| mod_interact1 | 22 | 13020.03 | 1346.456 |
| mod_quad_re | 15 | 13452.18 | 1778.614 |
| Model | Formula Concept | df | AIC | ΔAIC | |
|---|---|---|---|---|---|
mod_interact3 |
`temp_90 * stream + temp_90 * year + I(temp_90^2) + (1 | COMID)` | 25 | 11673.6 | 0.0 |
mod_interact2 |
`temp_90 * year + stream + I(temp_90^2) + (1 | COMID)` | 18 | 12109.6 | 436.0 |
mod_interact1 |
`temp_90 * stream + year + I(temp_90^2) + (1 | COMID)` | 22 | 13020.0 | 1346.5 |
mod_quad_re |
`temp_90 + I(temp_90^2) + stream + year + (1 | COMID)` | 15 | 13452.2 | 1778.6 |
Best model: mod_interact3 - Including both temp_90 × stream and temp_90 × year yields dramatic AIC improvement (~1779 units better than additive model) - Suggests that both spatial and temporal variation in temperature sensitivity are biologically meaningful
Likely reflects differences in: - Local stream-specific phenology (via temp_90 × stream) - Year-to-year climate anomalies and flow/thermal dynamics (via temp_90 × year)
mod_interact2 better than mod_interact1 - Interannual variation (year) appears more influential than streamwise variation in temp_90 response - But stream still matters — together, they outperform either alone
## model AIC R2 ICC
## mod_quad_re mod_quad_re 13452.18 0.7413331 0.9220937
## mod_interact1 mod_interact1 13020.03 0.7250133 0.9459792
## mod_interact2 mod_interact2 12109.61 0.6730601 0.9771622
## mod_interact3 mod_interact3 11673.57 0.6705336 0.9825959
As expected, the interactions are overfitting.
So What’s Going On? 1. Low AIC but bad predictions = likely overfitting - AIC rewards goodness-of-fit but penalizes complexity — and the interaction model may be “winning” by fitting spurious patterns (e.g. small temp ranges in some groups leading to flipped quads).
Recommendation - Stick with mod_quad_re - Excellent fit (highest R²), interpretable, smoother predictions - AIC difference (~1778) is likely an artifact of interaction complexity - Most biologically plausible model
→ Suggest making this your primary model
(#fig:plot_final-1)Final model predictions.
(#fig:plot_final-2)Final model predictions.
–>